Automatic construction of the dialog tree based on unmarked text corpora in Russian
Annotation
In this paper, we propose a method for automatically determining the structure of the tree and the key topics of nodes in the process of building a dialog tree based on unmarked text corpora. Building a dialog tree is one of the time-consuming tasks when creating an automatic dialog system and in most cases is performed on the basis of manual markup, which takes a lot of time and resources. The method of hierarchical clustering of dialogs takes into account the semantic proximity of messages, allows one to allocate a different number of nodes at each level of the hierarchy and limit the dialog tree in width and depth. The algorithm for constructing annotations of nodes of the dialog tree takes into account the hierarchy of topics by building thematic chains. The method is based on the complex use of natural language processing methods (tokenization, lemmatization, part-of-speech tagging, word embeddings, etc.), analysis of the main components to reduce the dimension and methods of cluster analysis. Experiments on constructing the structure of the dialog tree and annotating nodes have shown the great possibilities of the proposed method for constructing an automatic dialog tree. The recognition accuracy on the example of the reference dialog tree containing 13 nodes at the first level, 381 nodes at the second level and 299 nodes at the third level was 0.8, 0.7 and 0.5, respectively. Automatic construction of dialog trees can be in demand when developing automatic dialog systems and for improving the quality of generating answers to user questions.
Keywords
Постоянный URL
Articles in current issue
- Features of images of water, ice, snow, objects and a human formed by a hybrid television camera in the near-infrared range
- Analyzing periodical textured silicon solar cells by the TCAD modeling
- Scintillation gamma radiation sensors based on solid-state photomultipliers in wireless industrial internet networks
- Improving the quality of network management of technological processes
- Geometric approach to the solution of the Dubins car problem in the formation of program trajectories
- Drift of two-dimensional vacancy islands on the Si(100) surface under electromigration conditions
- A study of the photocatalytic properties of chitosan-TiO2 composites for pyrene decomposition
- Kinetics of transformation of the atomic step bunches shape under electromigration conditions on the Si(001) surface
- Abnormal diffusion profile of adatoms on extremely wide terraces of the Si(111) surface
- An experimental methodology for assessing the probability and danger of network attacks in automated systems
- A meta-feature selection method based on the Auto-sklearn framework
- Generic programming with combinators and objects
- Machine learning of the Bayesian belief network as a tool for evaluating the process frequency on social network data
- Software restructuring models for object oriented programming languages using the fuzzy based clustering algorithm
- The concept of managing the network structure of intelligent devices in the digital transformation of the energy industry
- Protecting facial images from recognition on social media: solution methods and their perspective
- Redundant models of testable distributed real-time computing systems
- A study of the influence of the base thickness on photoelectric parameters of silicon solar cells with the new TCAD algorithms
- A balanced algorithm of the hybrid large-particle method and its verification on some test problems
- The architecture of a system for full-text search by speech data based on a global search index
- Assessment of cerebral circulation through an intact skull using imaging photoplethysmography